Finite-State Morphological Analysis Of Persian
نویسنده
چکیده
This paper describes a two-level morphological analyzer for Persian using a system based on the Xerox finite state tools. Persian language presents certain challenges to computational analysis: There is a complex verbal conjugation paradigm which includes long-distance morphological dependencies; phonological alternations apply at morpheme boundaries; word and noun phrase boundaries are difficult to define since morphemes may be detached from their stems and distinct words can appear without an intervening space. In this work, we develop these problems and provide solutions in a finitestate morphology system.
منابع مشابه
Unification-Based Persian Morphology
We present a complete formalization of Persian inflectional morphology using a unification-based framework. The morphological analyzer was developed for use in a Persian-English machine translation system; it computes the part of speech categories and returns all syntactically relevant inflectional features for a word. The morphological analyses are represented as feature structures, which can ...
متن کاملApplying Finite State Morphology to Conversion Between Roman and Perso-Arabic Writing Systems
This paper presents a method for converting back and forth between the Perso-Arabic and a Romanized writing systems for Persian. Given a word in one writing system, we use finite state transducers to generate morphological analysis for the word that is subsequently used to regenerate the orthography of the word in the other writing system. The system has been implemented in XFST and LEXC.
متن کاملLow-Density Language Bootstrapping: the Case of Tajiki Persian
Low-density languages raise difficulties for standard approaches to natural language processing that depend on large online corpora. Using Persian as a case study, we propose a novel method for bootstrapping MT capability for a low-density language in the case where it relates to a higher density variant. Tajiki Persian is a low-density language that uses the Cyrillic alphabet, while Iranian Pe...
متن کاملImplementing Urdu Grammar as Open Source Software
Urdu is a challenging language because of, first, its Perso-Arabic script, second, its morphological system having inherent grammatical forms and vocabulary of Arabic, Persian and the native languages of South Asia and third, its pragmatically neutral constituent order (SOV Subject Object Verb). Today, the state of art technology to write grammars (morphology + syntax) is to use specialpurpose ...
متن کاملA Persian Part-Of-Speech Tagger Based on Morphological Analysis
This paper describes a method based on morphological analysis of words for a Persian Part-Of-Speech (POS) tagging system. This is a main part of a process for expanding a large Persian corpus called Peyekare (or Textual Corpus of Persian Language). Peykare is arranged into two parts: annotated and unannotated parts. We use the annotated part in order to create an automatic morphological analyze...
متن کامل